The Plant Genome
○ Wiley
All preprints, ranked by how well they match The Plant Genome's content profile, based on 53 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Tomura, S.; Powell, O. M.; Wilkinson, M. J.; Cooper, M.
Show abstract
While various genomic prediction models have been evaluated for their potential to accelerate genetic gain for multiple traits, no individual genomic prediction model has outperformed all others across all applications. As an alternative approach, ensembles of multiple individual genomic prediction models can be applied to utilise the complementary strengths of individual prediction models and offset the prediction errors of each. We used the EasiGP (Ensemble AnalySis with Interpretable Genomic Prediction) pipeline to investigate the performance of an ensemble approach, targeting flowering-time traits measured in two maize nested association mapping datasets. For both datasets, the ensemble-based prediction approach achieved higher prediction accuracy and lower prediction error across the flowering-time traits compared to each individual model. Multiple genomic regions known to contain key flowering-time related genes were repeatedly included as features across individual genomic prediction models, indicating the models successfully captured SNPs as features that are associated with genomic regions known to contain flowering-time genes. Although repeatability was high for some genomic regions, estimated marker effects varied across many genomic regions, suggesting that the models might also have captured different aspects of the genetic variation underlying the traits. The ensemble combination of the diverse views likely contributed to the improvement of prediction performance by the ensemble-based approach over the individual prediction models. Ensemble-based prediction can be applied to overcome limitations observed in the continuous exploration for the best individual genomic prediction models that can consistently achieve the highest prediction performance, thereby potentially contributing to improved prediction accuracy for applications in crop breeding. Article summaryThis study targets researchers interested in the performance of genomic prediction models. To demonstrate potential advantages of an ensemble of diverse individual genomic prediction models, we investigated the prediction of key flowering-time traits (days to anthesis and anthesis to silking interval) in two maize datasets. The ensemble approach consistently improved the prediction performance. The improvement was attributed to the offset of prediction errors by combining multiple different dimensions of trait genetic variation. Ensembles can lead to higher selection accuracy of desirable individuals for applications in crop breeding.
Orhobor, O. I.; Alexandrov, N. N.; Chebotarov, D.; Kretzschmar, T.; McNally, K. L.; Sanciangco, M. D.; King, R. D.
Show abstract
To secure the worlds food supply it is essential that we improve our knowledge of the genetic underpinnings of complex agronomic traits. In this paper, we report our findings from performing trait prediction and association mapping using marker stability in diverse rice landraces. We used the least absolute shrinkage and selection operator as our marker selection algorithm, and considered twelve real agronomic traits and a hundred simulated traits using a population with approximately a hundred thousand markers. For trait prediction, we considered several statistical/machine learning methods. We found that some of the methods considered performed best when preselected markers using marker stability were used. However, our results also show that one might need to make a trade-off between model size and performance for some learning methods. For association mapping, we compared marker stability to the genome-wide efficient mixed-model analysis (GEMMA), and for the simulated traits, we found that marker stability significantly outperforms GEMMA. For the real traits, marker stability successfully identifies multiple associated markers, which often entail those selected by GEMMA. Further analysis of the markers selected for the real traits using marker stability showed that they are located in known quantitative trait loci (QTL) using the QTL Annotation Rice Online database. Furthermore, co-functional network prediction of the selected markers using RiceNet v2 also showed association to known controlling genes. We argue that a wide adoption of the marker stability approach for the prediction of agronomic traits and association mapping could improve global rice breeding efforts.
Santos Junior, D. R. d.; Fe, D.; Lenk, I.; Jensen, C. S.; Asp, T.; Janss, L.; Bornhofen, E.
Show abstract
The performance of a single cross is determined by the average additive effects of the parents, as well as the interactions between them. These quantities can be estimated using an appropriate genetic design, allowing for the estimation of general (GCA) and specific (SCA) combining abilities. The prediction of GCA for new parents and the total genetic value of unrealized crosses can be made when genome-wide marker information is available. Several studies in crops such as maize and rice have demonstrated the potential of genomic-assisted prediction of single-cross performance in economically important crops. However, no study to date has explored its relevance in perennial ryegrass, an obligate allogamous species that is bred in genetically heterogeneous families. In this study, we aimed to estimate genetic parameters and assess the ability of genomic models to predict the performance of F2 families in terms of dry matter yield and nutritive quality traits. We used data from a large partial diallel involving 104 parents from two distinct subpopulations, as inferred by admixture analysis. F2 families were evaluated in multiple environments and under two nitrogen availability conditions. Genotyping-by-sequencing of the parent plants produced 42,145 variants after quality control, which were used to estimate genomic relationships based on identity-by-state. Variance component estimation revealed limited GCA and SCA interactions with the environment, and particularly with nitrogen management. The predictive abilities of two parental models exceeded 0.60 and often surpassed 0.70 for most traits. However, incorporating non-additive effects into the model did not improve predictive ability. We leveraged the genetic diversity among parents to map genomic regions associated with all recorded traits. Genome-wide association studies (GWAS) by genomic best linear unbiased prediction (GBLUP) identified six quantitative trait loci (QTL) regions, with 45 candidate genes within the linkage disequilibrium range, estimated at approximately 92 kb. Our results demonstrate that genomic prediction of single crosses can be performed with high accuracy, especially when both parents are also progenitors of families in the training set.
Rebollo, I.; Tolhurst, D.; Obsteter, J.; Rosas, J. E.; Gorjanc, G.
Show abstract
Rice (Oryza sativa L.) has two main subspecies, indica and japonica, which coexist in many regions but are often treated separately during breeding. Combining both subspecies in quantitative genetic analyses could enhance genetic improvement, however, this requires appropriately modelling their genetic history. The ancestral recombination graph (ARG) is an effective population genetics tool that comprehensively and succinctly represents a species genetic history. This study evaluated the use of an ARG, encoded as a tree sequence, to improve quantitative genetic analyses of indica and japonica rice. Using data from Uruguays National Rice Breeding Program, we inferred ancestral alleles, constructed and dated an ARG, and examined its application in genomic prediction and genome-wide association studies. We compared the predictive ability of a branch-based relationship matrix (BRM) built from an ARG against conventional relationship matrices from pedigree and single nucleotide polymorphism (SNP) site data. We then estimated BRMs SNP site effects to identify potential sites of interest and better understand how these map onto the tree sequence branches. The results showed that the ARG captured key biological signals, encoded genomic data more efficiently than conventional formats, and resulted in the highest predictive ability when combining both subspecies. Although the ARG-based approach did not substantially outperform conventional approaches for between-species prediction, this approach holds promise for plant breeding with larger datasets and could enhance genome-wide association studies by elucidating haplotype ancestry and the evolution of their value. Overall, our results demonstrated the potential of ARGs for the quantitative genetic analysis of diverse populations.
Kuster, R. D.; Sisler, P.; Sandhu, K.; Yin, L.; Niece, S.; Krueger, R.; Dardick, C.; Keremane, M.; Ramadugu, C.; Staton, M. E.
Show abstract
BackgroundPangenomes are a promising new approach to genomics that can reduce reference bias in genotyping, but the reliability of such a data model remains unclear in tracking variation across species. To test the utility of graph-based pangenomes for interspecific breeding, we developed a Minigraph-Cactus super-pangenome representing four Citrus species derived from the founder lines of a citrus breeding program. To benchmark SNP calling accuracy using graph and linear-based approaches, we performed whole genome short read sequencing for two sets of pedigreed progeny: 30 F1 hybrids and 244 advanced hybrids from an F1 crossed with a parent not included in the pangenome. ResultsThe linear approach yielded more SNP calls than the graph-based approach, however, both methods exhibited similar Mendelian Inheritance Error Rates (MIER) in a tool-dependent manner. Reconstruction of parental haplotype blocks in the advanced hybrids revealed a striking improvement in performance in the pangenome graph-based calls, suggesting MIER is vulnerable to error when reference bias influences both parental and progeny genotype calls. Masking of regions diverged from the reference path improved MIER accuracy metrics and haplotype block reconstruction in both the linear and graph-based SNP calls. ConclusionsIn non-model systems, inheritance patterns observed from pedigreed hybrids provide a framework for benchmarking variant-calling accuracy using pangenomes. SNP miscalls originating from diverged regions can falsely satisfy MIER filters, thus we recommend haplotype blocks. The inherent structure of the pangenome graph has promising applications for removing regions of unreliable mapping quality, which cannot otherwise be reliably removed using traditional filtering metrics.
Kitony, J. K.; Reyes, V. P.; Sunohara, H.; Tasaki, M.; Yamasaki, M.; Mori, J.-i.; Shimazu, A.; Nishiuchi, S.; Michael, T. P.; Doi, K.
Show abstract
Genomic selection (GS) can accelerate genetic gain in crops, but its effectiveness depends on training population design and marker density. Nested association mapping (NAM) populations provide a structured framework that captures broad allelic diversity within a controlled genetic background. Here, we evaluated genomic prediction (GP) and genome-wide association study (GWAS) performance in an expanded aus-NAM population of rice comprising 1,818 recombinant inbred lines across 14 families and 11 agronomic traits, using genotyping-by-sequencing (GBS) markers and projected whole-genome sequence variants. Prediction accuracy plateaued at moderate marker densities ([~]20k SNPs) and with training populations of [~]500 lines ([~]40-60% of the available pool), with trait heritability emerging as the strongest determinant of predictive performance rather than model choice or marker density. In contrast, GWAS resolution continued to improve with increasing marker density, enabling detection of additional loci, including a chromosome 12 locus associated with heading date, while consistently recovering well-characterized genes such as EARLY HEADING DATE 1 (Ehd1) and SEMIDWARF 1 (SD1). These contrasting patterns indicate that GP reaches near-optimal performance once genome-wide variation is adequately represented, whereas GWAS benefits from higher marker density through improved locus resolution. The present study establishes a benchmark for implementing breeding programs involving japonica/indica crosses using GP in a single environment.
Godoy, J. C.; Edwards, J.; Lee, E. C.; Mikel, M. A.; Fernandes, S. B.; Hirsch, C. N.; Berry, S. P.; Lipka, A. E.; Bohn, M. O.
Show abstract
The early 20th-century discovery of heterosis and the establishment of heterotic groups transformed maize (Zea mays L.) into a keystone of global agriculture. However, maize breeding faces two significant challenges: the gradual decline of general combining ability (GCA) variance within heterotic groups and the impracticality of testing all possible single crosses in the early stages of a breeding program. Here, we developed genomic best linear unbiased prediction (GBLUP)-based multi-kernel models, using additive and two alternative non-additive genomic relationship matrices, to estimate the variance components associated with the GCA of Stiff Stalk (SS) and Non-Stiff Stalk (NSS) heterotic groups and the specific combining ability (SCA) arising from their crosses. We further applied these models to predict the performance of untested single-cross combinations under varying levels of parental information. We showed that the SS and NSS groups retained significant GCA variance across traits in both early- and late-maturity groups. The SS group, in contrast, exhibited no detectable GCA variance in grain yield for the intermediate-flowering subset of hybrids, highlighting a limitation for future genetic improvement. Furthermore, our results showed that GBLUP-based multi-kernel models effectively identified superior hybrids when parental information was available. In the absence of this information, however, these models underperformed compared to covariance-based approaches. Both types of non-additive matrices produced similar results, affirming the robustness of the inferred genetic architecture. Overall, this study sheds light on the future use of US maize commercial germplasm and demonstrates how GBLUP-based multi-kernel models can improve the efficiency of hybrid breeding programs.
Vidigal, P. M. P.; Momen, M.; Costa, P. M. A.; Barbosa, M. H. P.; Morota, G.; Peternelli, L. A.
Show abstract
BackgroundThe identification of genomic regions involved in agronomic traits is the primary concern for sugarcane breeders. Genome-wide association studies (GWAS) leverage the sequence variations to bridge phenotypes and genotypes. However, their effectiveness is limited in species with high ploidy and large genomes, such as sugarcane. As an alternative, a regional heritability mapping (RHM) method can be used to capture genetic signals that may be missed by GWAS by combining genetic variance from neighboring regions. We used RHM to screen the sugarcane genome aiming to identify regions with higher heritability associated with agronomic traits. We considered percentage of fiber in sugarcane bagasse (FB), apparent percentage of sugarcane sucrose (PC), tonnes of pol per hectare (TPH), and tonnes of stalks per hectare (TSH). MethodsSequence-capture data of 508 sugarcane (Saccharum spp.) clones from a breeding population under selection were processed for variant calling analysis using the sugarcane genome cultivar R570 as a reference. A set of 375,195 single nucleotide polymorphisms were selected after quality control. RHM was conducted by splitting the sugarcane genome into windows of 2 Mb length. ResultsWe selected the windows explaining > 20% of the total genomic heritability for TPH (64 windows - 5,654 genes) and TSH (72 windows - 6,050 genes), and > 15% for PC (16 windows - 1,517 genes) and FB (17 windows - 1,615 genes). The top five windows that explained the highest genomic heritability ranged from 20.8 to 24.6% for FB (629 genes), 18.0 to 22.0% for PC (452 genes), 53.8 to 66.0% for TPH (705 genes), and 59.5 to 67.4% for TSH (413 genes). The functional annotation of genes included in those top five windows revealed a set of genes that encode enzymes that integrate carbon metabolism, starch and sucrose metabolism, and phenylpropanoid biosynthesis pathways. ConclusionsThe selection of windows that explained the large proportions of genomic heritability allowed us to identify genomic regions containing a set of genes that are related to the agronomic traits in sugarcane. These windows spanned a region of 58.38Mb, which corresponds to 14.28% of the reference assembly in the sugarcane genome. We contend that RHM can be used as an alternative method for sugarcane breeders to reduce the complexity of the sugarcane genome.
Zhan, S.; Raherison, E.; Hargreaves, W.; Hughes, N.; Goessen, R.; Majidi, M. M.; Knox, R.; Cuthbert, R.; Lukens, L.
Show abstract
BackgroundGenetic variation of regulatory alleles plays a key role in evolution and breeding. In polyploids, regulatory differences may preferentially affect genes on homoeologous chromosomes or sub-genomes. Selection in plant breeding may act upon total transcript dosage across homoeologous genes and on alleles that have strong effects on the transcriptome. ResultsTo investigate these questions, we identified regulatory polymorphisms between an old and a recent hexaploid bread wheat cultivar (Triticum aestivum, 2n=6x=42, AABBDD). The recent cultivar was the product of decades of selection for grain yield and quality. Regulatory allele polymorphisms preferentially affected genes on homoeologous chromosomes but rarely affected genes on specific sub-genomes. The chromosomal distributions of regulatory alleles indicated that past selection had acted upon them, and the effect of selection differed between alleles targeting environmental response genes and genes involved in other processes. Modern cultivar alleles that affected many genes transcripts corresponded to known selection targets and improved field crop performance. Modern cultivar alleles also had significant effects on homoeologous genes, and these alleles also improved crop performance. ConclusionsPolyploid breeding across many species has been and will continue to be the key factor in plant improvement. By enhancing the favorability of strong regulatory alleles and by expanding the range of gene transcript abundances, genome duplications enable breeding progress.
Ramstein, G. P.; Larsson, S. J.; Cook, J. P.; Edwards, J.; Ersoz, E. S.; Flint-Garcia, S.; Gardner, C. A.; Holland, J. B.; Lorenz, A. J.; McMullen, M. D.; Millard, M. J.; Rocheford, T. R.; Tuinstra, M. R.; Bradbury, P.; Buckler, E. S.; Romay, M. C.
Show abstract
Heterosis has been key to the development of maize breeding but describing its genetic basis has been challenging. Previous studies of heterosis have shown the contribution of within-locus complementation effects (dominance) and their differential importance across genomic regions. However, they have generally considered panels of limited genetic diversity and have shown little benefit to including dominance effects for predicting genotypic value in breeding populations. This study examined within-locus complementation and enrichment of genetic effects by functional classes in maize. We based our analyses on a diverse panel of inbred lines crossed with two testers representative of the major heterotic groups in the United States (1,106 hybrids), as well as a collection of 24 biparental populations crossed with a single tester (1,640 hybrids). We assayed three agronomic traits: days to silking (DTS), plant height (PH) and grain yield (GY). Our results point to the presence of dominance for all traits, but also among-locus complementation (epistasis) for DTS and genotype-by-environment interactions for GY. Consistently, dominance improved genomic prediction for PH only. In addition, we assessed enrichment of genetic effects in classes defined by genic regions (gene annotation), structural features (recombination rate and chromatin openness), and evolutionary features (minor allele frequency and evolutionary constraint). We found support for enrichment in genic regions and subsequent improvement of genomic prediction for all traits. Our results point to mechanisms by which heterosis arises through local complementation in proximal gene regions and suggest the relevance of dominance and gene annotations for genomic prediction in maize.
You, F. M.; Zheng, C.; Zagariah Daniel, J. J.; Li, P.; Jackle, K.; House, M.; Tar'an, B.; Cloutier, S.
Show abstract
Genomic selection (GS) is a promising strategy to improve breeding efficiency for complex traits such as seed yield by enabling early selection and reducing reliance on extensive field testing. However, practical deployment of GS remains challenging due to limited training populations sizes and reduced prediction accuracies when models are applied to true breeding germplasm. In this study, we evaluated GS for flax (Linum usitatissimum L.) seed yield under realistic breeding scenarios, with a focus on across-population prediction (APP) and breeding decision support rather than model benchmarking. Using historical germplasm collections and a newly developed breeding-oriented population as training sets, GS performance was assessed across multiple independent test populations representing contemporary breeding lines evaluated in replicated yield trials. APP accuracies reached r = 0.84 when training and test populations were genetically aligned, supporting routine breeding deployment. Training population composition emerged as a key determinant of prediction success, with breeding-oriented populations consistently outperforming broad germplasm collections for predicting true breeding lines. Check-based selection analyses showed that GS reliably reproduced phenotypic advancement decisions while eliminating 61-91% of low-performing lines, resulting in 48-78% reduction in field evaluation costs for a typical cohort of 300 lines. Marker subsampling analyses further indicated that moderate-density genotyping-by-sequencing panels ([~]2,500-3,000 SNPs) are sufficient to achieve stable prediction accuracies. Overall, these results demonstrate that GS for seed yield in flax is ready for routine integration into breeding programs, offering a practical pathway to reduce costs, accelerate breeding cycles, and enhance selection efficiency.
Qi, W.; Lim, Y.-W.; Patrignani, A.; Schlaepfer, P.; Bratus-Neuenschwander, A.; Grueter, S.; Chanez, C.; Rodde, N.; Prat, E.; Vautrin, S.; Fustier, M.-A.; Pratas, D.; Schlapbach, R.; Gruissem, W.
Show abstract
BackgroundCassava (Manihot esculenta) is an important clonally propagated food crop in tropical and sub-tropical regions worldwide. Genetic gain by molecular breeding is limited because cassava has a highly heterozygous, repetitive and difficult to assemble genome. FindingsHere we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, in combination with the assembler hifiasm, produced genome assemblies at near complete haplotype resolution with higher continuity and accuracy compared to conventional long sequencing reads. We present two chromosome scale haploid genomes phased with Hi-C technology for the diploid African cassava variety TME204. Genome comparisons revealed extensive chromosome re-arrangements and abundant intra-genomic and inter-genomic divergent sequences despite high gene synteny, with most large structural variations being LTR-retrotransposon related. Allele-specific expression analysis of different tissues based on the haplotype-resolved transcriptome identified both stable and inconsistent alleles with imbalanced expression patterns, while most alleles expressed coordinately. Among tissue-specific differentially expressed transcripts, coordinately and biasedly regulated transcripts were functionally enriched for different biological processes. We use the reference-quality assemblies to build a cassava pan-genome and demonstrate its importance in representing the genetic diversity of cassava for downstream reference-guided omics analysis and breeding. ConclusionsThe haplotype-resolved genome allows the first systematic view of the heterozygous diploid genome organization in cassava. The completely phased and annotated chromosome pairs will be a valuable resource for cassava breeding and research. Our study may also provide insights into developing cost-effective and efficient strategies for resolving complex genomes with high resolution, accuracy and continuity.
Shaffer, W.; Papin, V.; Yadav, S.; Voss-Fels, K. P.; Hickey, L.; Hayes, B.; Dinglasan, E. G.
Show abstract
Quantitative trait loci (QTL) discovery studies on diversity panels or breeding populations typically use genome-wide association studies (GWAS) to estimate marker effects. For plant and animal breeding applications, researchers increasingly recognize the potential benefits of identifying superior haplotypes (markers in linkage disequilibrium; LD) rather than relying on single markers, as traditional approaches inefficiently account for cumulative signals from incomplete LD with QTL or split effects when multiple markers are in high LD with QTL. Using the genomic prediction framework, the local GEBV (localGEBV) method was developed in animal breeding and has been adopted in crop haplotype mapping studies; however, no study has thoroughly quantified the utility of this method or systematically compared outcomes to traditional GWAS approaches. Here, we characterized a strategy to group markers in chromosomal segments based on LD (haplotype blocks or haploblocks), computed localGEBV as a linear contrast of marker effects within each haploblock, and utilised the variance of localGEBV to enhance QTL discovery compared to traditional GWAS. Marker effects for localGEBV were estimated with ridge-regression best linear unbiased prediction (rrBLUP) and BayesR, with results compared to two common GWAS approaches. Using the barley row-type trait, we demonstrated that localGEBV improved QTL discovery and phenotypic prediction compared to single markers. Furthermore, localGEBV results were robust to the choice of prior marker assumptions and blocking parameters, enabling flexibility in fine or broad-scale QTL mapping. Overall, our findings establish localGEBV as a haplotype-based strategy capable of leveraging localized genomic effects to improve QTL discovery and, potentially, genomic selection.
Vourlaki, I.-T.; Ramos-Onsins, S. E.; Perez-Enciso, M.; Castanera, R.
Show abstract
Structural variants (SVs) such as deletions, inversions, duplications, and Transposable Element (TE) Insertion Polymorphisms (TIPs) are prevalent in plant genomes and have played an important role in evolution and domestication, as they constitute a significant source of genomic and phenotypic variability. Nevertheless, most methods in quantitative genetics focusing on crop improvement, such as genomic prediction, consider Single Nucleotide Polymorphisms (SNPs) as the only type of genetic marker. Here, we used rice to investigate whether combining the structural and nucleotide genome-wide variation can improve prediction ability of traits when compared to using only SNPs. Moreover, we also examine the potential advantage of Deep Learning (DL) networks over Bayesian Linear models, which have been widely applied in genomic prediction. Specifically, the performance of BayesC and a Bayesian Reproducible Kernel Hilbert space regressions were compared to two different DL architectures, the Multilayer Perceptron, and the Convolution Neural Network. We further explore their prediction ability by using various marker input strategies and found that exploiting structural and nucleotide variation improves prediction ability on complex traits in rice. Also, DL models outperformed Bayesian models in 75% of the studied cases. Finally, DL systematically improved prediction ability of binary traits against the Bayesian models.
Bertolini, E.; Manjunath, M.; Ge, W.; Murphy, M. D.; Inaoka, M.; Fliege, C.; Eveland, A. L.; Lipka, A. E.
Show abstract
Plant architecture is a major determinant of planting density, which enhances productivity potential for crops per unit area. Genomic prediction is well-positioned to expedite genetic gain of plant architecture traits since they are typically highly heritable. Additionally, the adaptation of genomic prediction models to query predictive abilities of markers tagging certain genomic regions could shed light on the genetic architecture of these traits. Here, we leveraged transcriptional networks from a prior study that contextually described developmental progression during tassel and leaf organogenesis in maize (Z. mays) to inform genomic prediction models for architecture traits. Since these developmental processes underlie tassel branching and leaf angle, two important agronomic architecture traits, we tested whether genes prioritized from these networks quantitatively contribute to the genetic architecture of these traits. We used genomic prediction models to evaluate the ability of markers in the vicinity of prioritized network genes to predict breeding values of tassel branching and leaf angle traits for two diversity panels in maize, and diversity panels from sorghum (S. bicolor) and rice (O. sativa). Predictive abilities of markers near these prioritized network genes were similar to those using whole-genome marker sets. Notably, markers near highly connected transcription factors from core network motifs in maize yielded predictive abilities that were significantly greater than expected by chance in not only maize but also closely related sorghum. We expect that these highly connected regulators are key drivers of architectural variation that are conserved across closely related cereal crop species. Article summaryWe used an approach typically used for breeding to infer the contributions of biological gene networks to plant architectural traits. We found that markers near genes belonging to smaller, specialized gene networks from maize could predict breeding values of leaf angle better than expected by chance for both maize and sorghum.
Villwock, S. S.; Parkes, E. Y.; Nkouaya Mbanjo, E. G.; Rabbi, I.; Jannink, J.-L.
Show abstract
Cassava breeders aim to increase the provitamin A carotenoid content of storage roots to help combat vitamin A deficiency in sub-Saharan Africa, but a negative genetic correlation between total carotenoid (TC) and dry matter (DM) contents hinders breeding efforts. Genetic linkage between a major-effect variant in the phytoene synthase 2 (PSY2) gene and nearby candidate gene(s) has been thought to drive this correlation. Evidence from molecular experiments, however, suggest there may be a metabolic relationship between TC and DM, which we predicted would create genome-wide mediated pleiotropy. Bivariate genome-wide associations were used to test the hypothesis of pleiotropy and examine the genetic architecture of the negative covariance between TC and DM. A population of 378 accessions in the yellow-fleshed cassava breeding program at the International Institute of Tropical Agriculture (IITA) in Ibadan, Nigeria was genotyped with DArTseqLD. TC measured by iCheck spectrometer and DM data were available from field trials over ten years across three locations in Nigeria. Mixed linear models controlling for the previously-identified PSY2 causal variant were used to identify multiple new quantitative trait loci (QTL) jointly associated with both traits. The majority of 17 jointly-associated loci identified at a relaxed significance threshold affected TC and DM in opposite directions, although this pattern did not reach statistical significance in a binomial test. Even after accounting for the effects of these 17 loci as covariates, there was significantly negative polygenic covariance between TC and DM remaining. These findings support the hypothesis that mediated pleiotropy rather than genetic linkage drives the negative genetic correlation between TC and DM in cassava and demonstrate a new application of multivariate GWAS for interrogating the genetic architecture of correlated traits. Plain language summaryIncreasing provitamin A in cassava roots has reduced their dry matter content, making vitamin-enriched cassava varieties less desirable. This study used multi-trait models to identify shared genetic factors, most of which had opposing effects on the two traits. The negative relationship was distributed across the genome, suggesting an inherent physiological trade-off. These findings will guide breeders in developing selection strategies for vitamin-enriched cassava and other starchy crops. More broadly, this study demonstrates the use of multi-trait associations to help distinguish whether traits are associated due to separate, nearby genes (genetic linkage) or if the same genes affect multiple traits (pleiotropy).
Ramasubramanian, V.; Beavis, W. D.
Show abstract
Herein we report the impacts of applying five selection methods across 40 cycles of recurrent selection and identify interactions among factors that affect genetic responses in sets of simulated families of recombinant inbred lines derived from 21 homozygous soybean lines. Our use of recurrence equation to model response from recurrent selection allowed us to estimate the half-lives, asymptotic limits to recurrent selection for purposes of assessing the rates of response and future genetic potential of populations under selection. The simulated factors include selection methods, training sets, and selection intensity that are under the control of the plant breeder as well as genetic architecture and heritability. A factorial design to examine and analyze the main and interaction effects of these factors showed that both the rates of genetic improvement in the early cycles and limits to genetic improvement in the later cycles are significantly affected by interactions among all factors. Some consistent trends are that genomic selection methods provide greater initial rates of genetic improvement (per cycle) than phenotypic selection, but phenotypic selection provides the greatest long term responses in these closed genotypic systems. Model updating with training sets consisting of data from prior cycles of selection significantly improved prediction accuracy and genetic response with three parametric genomic prediction models. Ridge Regression, if updated with training sets consisting of data from prior cycles, achieved better rates of response than BayesB and Bayes LASSO models. A Support Vector Machine method, with a radial basis kernel, had the worst estimated prediction accuracies and the least long term genetic response. Application of genomic selection in a closed breeding population of a self-pollinated crop such as soybean will need to consider the impact of these factors on trade-offs between short term gains and conserving useful genetic diversity in the context of the goals for the breeding program.
Halpin-McCormick, A.; Campbell, Q.; Negrao, S.; Morrell, P. L.; Hubner, S.; Neyhart, J.; Kantar, M. B.
Show abstract
The genetic basis of adaptation is a fundamental question in evolutionary genetics. Environmental association analysis (EAA) and various allele frequency comparisons in genomic environmental association (GEA) have become standard approaches for investigating the genetic basis of adaptation to natural environments. While these analyses provide insight into local adaptation, they have not been widely adopted in breeding or conservation programs. This may be attributable to the difficulty in identifying the best individuals for transplantation/relocation in conservation efforts or identification of the best parents in breeding programs. To explore the use of EAA and GEA for future breeding programs, we used a cereal crop - barley (Hordeum vulgare L.) as our case-study species due to its wide adaptability to different environments and agro-ecologies, ranging from marginal and low input fields to high-productive farms. Here, we use publicly available data to conduct environmental genomic selection (EGS) on 753 landrace barley accessions using a mini-core of 31 landrace accessions and a de-novo core of 100 as the training populations. Environmental genomic selection is to environmental association analysis (EAA) what genomic selection is to genome-wide association studies (GWAS). Since local adaptation to the environment is polygenic, a whole-genome approach is likely to be more accurate for selecting for environmental adaptation. Here we show distinct genetic background and population differences and how an integrative approach coupling environmental genomic selection and species distribution modelling can help identify key parents for breeding for adaptation to specific environmental variables and geographies to minimize linkage drag.
McGilp, L.; Millas, R.; Mickelson, A.; Shannon, L. M.; Kimball, J.
Show abstract
Cultivated Northern Wild Rice (Zizania palustris L.) is an obligately outcrossing, self-incompatible cereal grown in aquatic paddies in the United States. Genetic improvement has relied primarily on phenotypic recurrent selection, and genomic approaches remain largely unexplored in this emerging crop. We applied a single-plant genome-wide association study (sp-GWAS) framework to dissect vegetative architecture traits in five open-pollinated cultivated populations evaluated across three years (n = 2,173 plants). Plant height (PH), basal stem width (BSW), primary stem width (PSW), flag leaf length (FLL), and flag leaf width (FLW) were analyzed using a mixed linear model accounting for population structure and kinship. Broad-sense heritability ranged from 0.03 to 0.34, and year effects explained up to 54% of phenotypic variance, indicating strong environmental influence. After filtering 73,363 SNPs, genome-wide linkage disequilibrium decayed rapidly (r{superscript 2} = 0.1 at [~]2.3 kb). A total of 124 significant SNPs (FDR < 0.01) were consolidated into 98 loci, of which 46 were associated with multiple traits and 11 were shared across four traits. Candidate genes near multi-trait loci included conserved regulatory classes implicated in grass architecture, including HLH/bHLH transcription factors. Diplotype analyses at candidate loci revealed both simple biallelic and complex multi-allelic haplotype structures, indicating that locus-level haplotype effects underlie several GWAS signals. Results demonstrate that sp-GWAS can detect statistically robust associations in a highly heterozygous, non-replicable crop system and suggest a polygenic, coordinated genetic architecture governing vegetative growth. These findings support genomic prediction and multi-trait selection strategies to accelerate improvement of cultivated Northern Wild Rice. PLAIN LANGUAGE SUMMARYCultivated Northern Wild Rice is an important specialty crop grown in flooded paddies in the United States. Unlike many major crops, it is naturally outcrossing and highly variable, which makes traditional breeding challenging and slow. Most improvement efforts have relied on selecting plants based only on how they look in the field, and genomic tools have rarely been used. In this study, we used DNA markers to better understand the genetics behind plant structure traits such as plant height, stem thickness, and leaf width. We evaluated more than 2,000 plants from five cultivated populations over three growing seasons. Because weather and growing conditions strongly influence these traits, we used statistical models to separate environmental effects from genetic effects. We identified 98 regions of the genome associated with variation in plant structure. Many of these regions influenced more than one trait, showing that plant height, stem strength, and leaf size are genetically connected. Several regions contained genes similar to those known to control plant growth and development in other grasses. We also found that, in some cases, combinations of nearby DNA variants (haplotypes) explained trait differences better than single genetic markers. Overall, this work shows that modern genomic tools can successfully identify useful genetic variation in cultivated Northern Wild Rice, even though it is highly outcrossing and genetically diverse. These results provide a foundation for using genomic selection to improve plant structure, lodging resistance, and overall performance in breeding programs. CORE IDEASO_LISingle-plant GWAS successfully detects genetic associations in obligately outcrossing cultivated Northern Wild Rice where conventional replicated mapping populations are impractical. C_LIO_LIVegetative architecture traits exhibit low heritability but retain recoverable polygenic signal, where nearly half of detected loci influence multiple architecture traits, indicating integrated developmental control. C_LIO_LIGenome-wide linkage disequilibrium decays rapidly ([~]2.3 kb), consistent with expectations for an obligately outcrossing species and supporting relatively localized association signals. C_LIO_LICandidate genes include conserved regulatory classes (TE1-like, HLH/bHLH, SPL). C_LIO_LIGiven extensive overlap between QTL and environmental effect, multi-trait, multi-environment genomic prediction provides a pragmatic breeding strategy to improve canopy efficiency, lodging resistance, and harvestability in aquatic production systems. C_LI
Lin, Y.-C.; Urbany, C.; Shlykova, A.; Hoelker, A.; Ouzunova, M.; Prester, T.; Pook, T.; Mayer, M.; Urzinger, S.; Schoen, C. C.
Show abstract
Securing sustainable crop production requires the genetic improvement of abiotic stress tolerance. Due to the broad range of environmental factors causing abiotic stress and complex genotype-by-environment interactions, it is crucial to understand the genetic basis of crop yield under suboptimal conditions. Here, we developed a dent maize Multi-parent Advanced Generation Inter-Cross (MAGIC) population comprising 388 doubled haploid (DH) lines. The population was derived from eight founders with varying stress tolerance, selected from a dent diversity panel evaluated for yield performance across a wide range of European environments. The MAGIC DH lines were genotyped via whole-genome sequencing ([~]5X coverage) and evaluated in seven testcross and 14 line per se trials, for grain dry matter yield, leaf senescence, leaf rolling, anthesis-silking interval, and six additional agronomic traits. Genetic dissection identified 22 grain yield QTL, explaining 45% of the genetic variance. Under heat and drought stress, testcross grain yield correlated significantly with leaf senescence and leaf rolling measured in line per se trials. Bivariate multi-trait analysis showed that alleles for delayed senescence and reduced rolling at detected QTL generally exhibited positive effects on grain yield, suggesting that accumulating these favorable alleles could enhance yield performance. Incorporating these proxies into multi-trait genomic prediction models improved yield prediction accuracy, although gains were constrained by modest trait correlations. Given the comprehensive data, we also provide recommendations for optimizing sequencing depth and QTL mapping strategies in experimental maize populations. Key messageThis eight-founder MAGIC population represents a powerful resource for dissecting complex traits in maize, assessing the utility of drought proxy traits, and optimizing low-coverage whole-genome sequencing approaches.